Mining Generalised Emerging Patterns

نویسندگان

  • Xiaoyuan Qian
  • James Bailey
  • Christopher Leckie
چکیده

Emerging Patterns (EPs) are a data mining model that is useful as a means of discovering distinctions inherently present amongst a collection of datasets. However, current EP mining algorithms do not handle attributes whose values are asscociated with taxonomies (is-a hierarchies). Current EP mining techniques are restricted to using only the leaf-level attribute-values in a taxonomy. In this paper, we formally introduce the problem of mining generalised emerging patterns. Given a large data set, where some attributes are hierarchical, we find emerging patterns that consist of items at any level of the taxonomies. Generalised EPs are more concise and interpretable when used to describe some distinctive characteristics of a class of data. They are also considered to be more expressive because they include items at higher levels of the hierarchies, which have larger supports than items at the leaf level. We formulate the problem of mining generalised EPs, and present an algorithm for this task. We demonstrate that the discovered generalised patterns, which contain items at higher levels in the hierarchies, have greater support than traditional leaf-level EPs according to our experimental results based on ten benchmark datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences

Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...

متن کامل

Generalised interaction mining: probabilistic, statistical and vectorised methods in high dimensional or uncertain databases

Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, useful and ultimately understandable patterns in data. The core step of the KDD process is the application of Data Mining (DM) algorithms to e ciently nd interesting patterns in large databases. This thesis concerns itself with three inter-related themes: Generalised interaction and rule mining; the i...

متن کامل

A New Algorithm for High Average-utility Itemset Mining

High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...

متن کامل

Understanding Temporal Human Mobility Patterns in a City by Mobile Cellular Data Mining, Case Study: Tehran City

Recent studies have shown that urban complex behaviors like human mobility should be examined by newer and smarter methods. The ubiquitous use of mobile phones and other smart communication devices helps us use a bigger amount of data that can be browsed by the hours of the day, the days of the week, geographic area, meteorological conditions, and so on. In this article, mobile cellular data mi...

متن کامل

Efficient mining of interesting emerging patterns and their effective use in classification

Knowledge Discovery in Databases (KDD), or Data Mining is used to discover interesting or useful patterns and relationships in data, with an emphasis on large volume of observational databases. Among many other types of information (knowledge) that can be discovered in data, patterns that are expressed in terms of features are popular because they can be understood and used directly by people. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006